Indexing Finite Language Representation of Population Genotypes

نویسندگان

  • Jouni Sirén
  • Niko Välimäki
  • Veli Mäkinen
چکیده

We propose a way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account. This is achieved through converting a multiple alignment of individual genomes into a finite automaton recognizing all strings that can be read from the alignment by switching the sequence at any time. The finite automaton is indexed with an extension of Burrows-Wheeler transform to allow pattern search inside the plausible recombinant sequences. The size of the index stays limited, because of the high similarity of individual genomes. The index finds applications in variation calling and in primer design.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

N-gram FST Indexing for Spoken Term Detection

An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexible in connectivity representation and confiden...

متن کامل

REXTOR: A System For Generating Relations From Natural Language

This paper argues that a finite-state language model with a ternary expression representation is currently the most practical and suitable bridge between natural language processing and information retrieval. Despite the theoretical computational inadequacies of finitestate grammars, they are very cost effective (in time and space requirements) and adequate for practical purposes. The ternary e...

متن کامل

مدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی

Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing.   This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...

متن کامل

A study of viewpoints of English language instructors to motivate Lerner to learning English through curricular; representation of a Model

One of the problems of students’ entrance from secondary education to  university is lack of English language skills  and incentive to improve their learning. This research aims to identify the ways to strengthen English language skills with an emphasis on undergraduate students' motivation. This research is qualitative approach and Grounded theory strategy. The study population has been consis...

متن کامل

Matrix representation of a sixth order Sturm-Liouville problem and related inverse problem with finite spectrum

‎In this paper‎, ‎we find matrix representation of a class of sixth order Sturm-Liouville problem (SLP) with separated‎, ‎self-adjoint boundary conditions and we show that such SLP have finite spectrum‎. ‎Also for a given matrix eigenvalue problem $HX=lambda VX$‎, ‎where $H$ is a block tridiagonal matrix and $V$ is a block diagonal matrix‎, ‎we find a sixth order boundary value problem of Atkin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011